Using Validation to Avoid Overfitting in Boosting Using Validation to Avoid Overfitting in Boosting

نویسندگان

  • Tom Bylander
  • Lisa Tate
  • Leslie Pack Kaelbling
چکیده

AdaBoost is a well known, effective technique for increasing the accuracy of learning algorithms. However, it has the potential to overfit the training set because it focuses on misclassified examples, which may be noisy. We demonstrate that overfitting in AdaBoost can be alleviated in a time-efficient manner using a combination of dagging and validation sets. The training set is partitioned into subsets. Each subset is trained with AdaBoost generating multiple hypotheses. The hypotheses are then applied to the validation set, which is made up of the entire training set. The validation set adjusts the weights of the hypotheses. The hypotheses generated by all subsets are then aggregated with a weighted plurality vote for final classification. We show our algorithm has similar performance on standard datasets and improved performance when classification noise is added. We also apply validation sets to another subset training algorithm, the BB algorithm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Functional Frank-Wolfe Boosting for General Loss Functions

Boosting is a generic learning method for classification and regression. Yet, as the number of base hypotheses becomes larger, boosting can lead to a deterioration of test performance. Overfitting is an important and ubiquitous phenomenon, especially in regression settings. To avoid overfitting, we consider using l1 regularization. We propose a novel Frank-Wolfe type boosting algorithm (FWBoost...

متن کامل

Avoiding Boosting Overfitting by Removing Confusing Samples

Boosting methods are known to exhibit noticeable overfitting on some datasets, while being immune to overfitting on other ones. In this paper we show that standard boosting algorithms are not appropriate in case of overlapping classes. This inadequateness is likely to be the major source of boosting overfitting while working with real world data. To verify our conclusion we use the fact that an...

متن کامل

Probing for Sparse and Fast Variable Selection with Model-Based Boosting

We present a new variable selection method based on model-based gradient boosting and randomly permuted variables. Model-based boosting is a tool to fit a statistical model while performing variable selection at the same time. A drawback of the fitting lies in the need of multiple model fits on slightly altered data (e.g., cross-validation or bootstrap) to find the optimal number of boosting it...

متن کامل

A Fast Scheme for Feature Subset Selection to Avoid Overfitting in AdaBoost

AdaBoost is a well known, effective technique for increasing the accuracy of learning algorithms. However, it has the potential to overfit the training set because its objective is to minimize error on the training set. We show that with the introduction of a scoring function and the random selection of training data it is possible to create a smaller set of feature vectors. The selection of th...

متن کامل

Using Validation Sets to Avoid Overfitting in AdaBoost

AdaBoost is a well known, effective technique for increasing the accuracy of learning algorithms. However, it has the potential to overfit the training set because its objective is to minimize error on the training set. We demonstrate that overfitting in AdaBoost can be alleviated in a time-efficient manner using a combination of dagging and validation sets. Half of the training set is removed ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007